-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TIME/TIMESTAMP W/O TIME ZONE semantics fix - continuation (v3) #10193
Conversation
Currently this is #9385 rebased on master + fix for #9385 (comment) cc @haozhun . Before you review, we will yet retract hive changes from this. |
5166901
to
08cb1fc
Compare
@haozhun Here is the prefix of https://github.com/starburstdata/presto/tree/epic/timestampp/3, which skips the Hive connector changes as we discussed. Feel free to review when you have a chance. |
08cb1fc
to
0ee7a18
Compare
Given the current state of this work stream, every one agrees that it's better to merge existing work, and make changes later. As a result, my approach to reviewing this PR is going to be more lenient on the front of general designs. If I have a comment that I consider that is better addressed in a follow up, I'll point that out. And those should eventually go in #10326. I would like to emphasize that design choices that is merged from this work stream is subject to change in later commits. |
FYI, Your commits are not correctly ordered. Notes:
Commits:
|
I rebased on current master without changing anything except resolving a few trivial conflicts. Below I am addressing the must-have bullets (and some TODOs as well). Big thanks for your thorough review
I added a squahs commit, so that I don't mess with the commit history yet.
I though this way about TIME WITH TIME ZONE (but ultimately decided not to, to avoid even more chaos). What representation would you like to see for TIME?
In general, I agree we should deprecate methods, switches, flags just because a related functionality is going to be removed.
My initial reaction to using The stack representation of a timestamp can be interpreted as referring to UTC: "an instant of time (millis) which in UTC time zone corresponds to the timestamp value being represented". Also, the code actually leverages this fact, eg.
Done
Please help me understand. It seemed as unnecessary, as the
That comment was addressed more explicitly, in
I added a TODO note, not to mess with commit history at this point.
It should be the same, but existing code didn't have provisions to ensure that.
They represent two different things. See
I removed most of it.
The only reason for having this is to facilitate the transition.
This only for construcing session start time (
Done
I instead made
Now the two test classes are consistent.
The commit affects tests for current date, local time(stamp), current time(stamp). (The current date part is signalled by "+ Improve test coverage" in the commit message).
Please elaborate? Is it about
It's not going away, is it?
Given
Fixed.
Fixed
Done
They are very related -- both serve as a normalization of TIME / TIME WITH TIME ZONE value into the correct range. See how parallel is their use in
Fixed
To be sure, which Done.
Let's fix it when we remove political zones form TIME WITH TIME ZONE. I added a clarification comment in the code.
Fixed
Fixed
Fixed
Done
Fixed
I am not sure I follow.
Done
I didn't. Let's remove political time zones in a follow up.
Ditto. (Fixed)
Why? This was done so to make highlight the fact there are some shared test cases, but there are also some special test cases. This helps in that the corresponding tests can be easily discovered by some one reading the code.
Yes, this is not needed since we know that
As above.
Thanks
Good point.
I called them both
Right.
Fixed
I'm not. |
0ee7a18
to
317e8b6
Compare
For your commits I didn't mention, they are good. For your comments I didn't mention, I agree with them. Sorry but otherwise there would be too many items here. I consulted the code only to understand your response and my own comments. I didn't read any commits carefully. I depended on your response in the previous comment. "Fix current_time timezone offset" - looks good with comments
Be aware that the empty commit might disappear when you do a rebase. (That's what happend to me.) "Introduce new TIME/TIMESTAMP semantics to SQL types" - comments below
In a TODO item that follows my original comment, I was arguing for adding the word
I requested two things in this comment:
Therefore, I change my request to calling both to In addition, following this request, my other two comments follows:
"Add new TIME/TIMESTAMP parsing/printing to DateTimeUtils" - comments below
I definitely should have been more clear. It took me myself a while to figure out. After addressing the next comment (quoted below), you will need then need it. The
I disagree. Please change parseTimestampWithoutTimeZone. Here is my reasoning. Let's first figure out what are the methods we have in
When I reviewed last time, I spent a long time to figure these out. Now, once that's figured out, the supposed behavior of the new In addition, it's ok (although not ideal) for "Use legacy-flag aware wrappers for SqlTime/SqlTimestamp cration in tests" - comments
I make these all TODO items. I also added a TODO item to add comments to put some of your answers here into comments. "Fix current_time, localtime & localtimestamp semantics"
The reason it would make more sense to use Kathmandu is because tests a few lines below the I don't understand your statement "By using UTC we have more visible correlation between the date-time component used to construct those millis and the expected value in the test.". Being able to have correlation is the exact reason I want Kathmandu. Using UTC provides no visible correlation between the
Sorry, the original comment was indented wrong. They are meant to be subpoints of "It's really hard to reason about TIME WITH TIME ZONE with political zone in the picture. It's up to you to pick one of the two alternatives below." I was arguing against having anything related to TIME WITH TIMEZONE depend on isLegacyTimestamp. I was suggesting to either 1) not changing current_time, or 2) change it to the new behavior, without checking isLegacyTimestamp. But I don't feel strongly. After all, support for political zone in TIME WITH TIME ZONE will be removed before the isLegacyTimestamp is considered ready for use. "Introduce new TIME/TIMESTAMP semantics to scalar functions"
Your "Add new date time semantics to date time cast operators"
From the perspective of their purpose, yes, they are the similar. One normalizes TIME, the other normalizes TIME WITH TIME ZONE. But from the perspective of what they do, no, I don't think they are the same. Only one of them is doing Given their current name is If their names were I have put the rename in the TODO list below. I think it's reasonable to keep it as is for now to avoid messing up the commits.
In an earlier comment, I explained why I believe that the logic for allowing both string with/timeout time zone should be in DateTimeUtils. Once
I made this a TODO item. Don't worry about it for now. I think It's a bad idea to have the TIMESTAMP change based on which class it is. For the sake of sanity, I propse some rules for these tests. Rule 1:
Applying this rule, the
Here, I think it would make sense to rename For the sake of sanity, I propose rule 2 (which is really just an emphasis of rule 1:
From practical perspective, having some methods delegate to From the logical perspective, it will also make tests easier to find when there isn't hidden structures. I further propose rule 3 (Don't worry about it now. I have put it in TODO.)
Please change TestTimestampWithTimeZone to avoid using super. Put In addition, the additional test cases you added for TestTimestampWithTimeZone should also show up in TestTimestampWithTimeZoneLegacy, even though the expected value will likely be different. "Introduce new TIMESTAMP semantics to to_iso8601 scalar" - looks good with comments"
This is problematic. You have Please rename to Outstanding TODOshttps://gist.github.com/haozhun/ec833dd76b62a7df020a3b77ddf56a65 |
"Fix current_time timezone offset"
Yeah.. In short, the following commits are NEW
"Introduce new TIME/TIMESTAMP semantics to SQL types"
I added this as an additional commit rather than reworking the existing ones. "Add new TIME/TIMESTAMP parsing/printing to DateTimeUtils"
I agree that methods are not consistently named. I renamed However, I am not convinced about the desired semantics of Also, that is also a behavior that I want these methods to have. We need elasticity in two places
Now, assining type uses Thus, elasticity in "Fix current_time, localtime & localtimestamp semantics"
i see this now.
Yes, that's me thinking. What we have now is, by necessity, an approximation of the final solution. Let's revisit TIME WITH TIME ZONE as a follow-up. "Introduce new TIME/TIMESTAMP semantics to scalar functions"
Thanks! I appreciate your feedback :) "Add new date time semantics to date time cast operators"
I like this approach.
note: when doing this renames, we should also change
I think I like tests delegating to
Would it be OK to enlist this on a TODO list as well? |
317e8b6
to
c3437c9
Compare
I feel pretty strongly on the topic of I reiterated the "consistency" point. But my strong feeling is mostly about "utility".
There's also
Earlier, I conceded that
This talks about the first half of bullet point 1.
This talks about the second half of bullet point 1. The legacy
If I'm reading this correctly, this is repeating "that is a behavior that I want these methods to have". It looks like you forgot bullet point 2. There is one more reason here. I didn't mention it because I thought my other argument were sufficient. With your changes,
An accurate name would be There are 3 callsites of
Therefore, from a utility perspective, it's unnecessary to go down the path having separate methods for By the way, I would like to see |
c822149
to
b928ea3
Compare
@haozhun i applied changes as discussed + I added some (hopefully) clarification comments for parse methods. Please have a look. |
*/ | ||
public static long parseTimestampWithoutTimeZone(String value) | ||
{ | ||
return TIMESTAMP_WITHOUT_TIME_ZONE_FORMATTER.parseMillis(value); | ||
LocalDateTime localDateTime = TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER.parseLocalDateTime(value); | ||
return localDateTime.toDateTime(UTC).getMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All we need is org.joda.time.LocalDateTime#getLocalMillis
, but it's not accessible. @haozhun any better idea than what i have here?
AT TIME ZONE does not take into account the fact, that TIME WITH TIME ZONE does not represent real millisecond UTC in millisUtc field. In fact, this field contains millisUtc assuming offset of TIME ZONE that was valid on 1970-01-01. Such representation allows to simply represent local time with time zone id. That means that TIME WITH TIME ZONE that represents eg. '10:00:00.000 Asia/Kathmandu' will always represent this exact value. However mapping of such value to other TZ (including UTC) may differ over time. Eg. Asia/Kathmandu switched time zone offset on 1986 from +5:30 to +5:45. Result of query like: `SELECT time_with_tz_column FROM table;` Will always be the same, however: `SELECT time_with_tz_column AT TIME ZONE 'UTC' FROM table;` Will yail differnet value in 1980 and 2000 after changes from this commit. This is done to use current offset of TZ as function that stucked in 1970-01-01 offsets seems useless. This is not perfect solution and is not fully aligned with standard, but standard behavior cannot be achieved with current TIME WITH TIME ZONE representation, as we are not able to read TZ offset from TIME WITH TIME ZONE itselve (at least not in all cases).
Using this will allow to write simpler tests and avoid intermittent error caused by DST or changes in time zone policies. Without this change some test would require to rewrite to much logic from implementation making them less usefull.
For consistency with `parseTimestampLiteral`.
3b91d67
to
2e6191c
Compare
{ | ||
LocalDateTime localDateTime = TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER.parseLocalDateTime(value); | ||
try { | ||
return (Long) getLocalMillis.invoke(localDateTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invokeExact
is @PolymorphicSignature
, thus can return primitives directly.
If you change (Long)
to (long)
, you will be able to avoid boxing, and you will be able to use the more efficient invokeExact
.
return (long) getLocalMillis.invokeExact(localDateTime);
2e6191c
to
eb3d770
Compare
AC |
Congratulations! |
GJ & GZ all I seriously thought that will never happen :-) |
Part of #7122, supersedes #9385